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Health Sciences 

Researchers are often interested in drawing inferences regarding 
the order between two experimental groups on the basis of multivari- 
ate response data. Since standard multivariate methods are designed 
for two-sided alternatives, they may not be ideal for testing for order 
between two groups. In this article we introduce the notion of the 
linear stochastic order and investigate its properties. Statistical the- 
ory and methodology are developed to both estimate the direction 
which best separates two arbitrary ordered distributions and to test 
for order between the two groups. The new methodology generalizes 
Roy's classical largest root test to the nonparametric setting and is 
applicable to random vectors with discrete and/or continuous compo- 
nents. The proposed methodology is illustrated using data obtained 
from a 90-day pre-chronic rodent cancer bioassay study conducted 
by the National Toxicology Program (NTP). 



1. Introduction. In a variety of applications researchers are interested in 
comparing two treatment groups on the basis of several, potentially depen- 
dent outcomes. For example, to evaluate if a chemical is a neuro-toxicant, 
toxicologists compare a treated group of animals with an untreated control 
group in terms of various correlated outcomes such as tail-pinch response, 
click response and gait score, etc.; cf. Moser (2000). The statistical problem 
of interest is to compare the multivariate distributions of the outcomes in 
the control and treatment groups. Moreover, the outcome distributions are 
expected to be ordered in some sense. The theory of stochastic order rela- 
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tions [Shaked and Shanthikumar (2007)] provides the theoretical foundation 
for such comparisons. 

To fix ideas let X and Y be p-dimensional random variables (RVs); 
X is said to be smaller than Y in the multivariate stochastic order, de- 
noted X ^ st Y, provided P(X eU)< P(Y E U) for all upper sets U eW 
[Shaked and Shanthikumar (2007)]. If for some upper set the above inequal- 
ity is sharp, we say that X is strictly smaller than Y (in the multivariate 
stochastic order) which we denote by X -< st Y. Recall that a set U E MP 
is called an upper set if u E U implies that v E U whenever u < v, that 
is, if iti < Vi, i = 1, . . . ,p. Note that comparing X and Y with respect to 
the multivariate stochastic order requires comparing their distributions over 
all upper sets in MP. This turns out to be a very high-dimensional prob- 
lem. For example, if X and Y are multivariate binary RVs, then X ^ st Y 
provided X^teC/^ x (*) — Ste[/^Y(t) wnere Px(t) and py(t) are the corre- 
sponding probability mass functions. Here U £U p where IA V is the family 
of upper sets defined on the support of a p-dimensional multivariate binary 
RV. It turns out that the cardinality of U p , denoted by N p , grows super- 
exponentially with p. In fact iV\ = 1, N 2 = 4, N 3 = 18, iV 4 = 166, N 5 = 7579 
and Nq = 7,828,352. The values of N? and Ng are also known, but iVg is not. 
However, good approximations for N p are available for all p; cf. Davidov and 
Peddada (2011). Obviously the number of upper sets for general multivari- 
ate RVs is much larger. Since in many applications p is large, it would seem 
that the analysis of high-dimensional stochastically ordered data is practi- 
cally hopeless. As a consequence, the methodology for analyzing multivariate 
ordered data is underdeveloped. It is worth mentioning that Sampson and 
Whitaker (1989) as well as Lucas and Wright (1991) studied stochastically 
ordered bivariate multinomial distributions. They noted the difficulty of ex- 
tending their methodology to high-dimensional data due to the large num- 
ber of constraints that need to be imposed. Recently Davidov and Peddada 
(2011) proposed a framework for testing for order among K, p-dimensional, 
ordered multivariate binary distributions. 

In this paper we address the dimensionality problem by considering an 
easy to understand stochastic order which we refer to as the linear stochastic 
order. 

Definition 1.1. The RV X is said to be smaller than the RV Y in 
the (multivariate) linear stochastic order, denoted X ^;- s t Y, if for all s E 
M p + = {s:s > 0}, 

(1.1) s T X^ st s T Y, 

where ^ s t in (1.1) denotes the usual (univariate) stochastic order. 

Note that it is enough to limit (1.1) to all nonnegative real vectors sat- 
isfying ||s|| = 1, and accordingly we denote by ST~ the positive part of the 
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unit sphere in MP. We call each s G S^T a "direction." In other words the 
RVs X and Y are ordered by the linear stochastic order if every nonnegative 
linear combination of their components is ordered by the usual (univariate) 
stochastic order. Thus instead of considering all upper sets in W we need 
for each s £ 5^7 1 to consider only upper sets in R. This is a substantial re- 
duction in dimensionality. In fact we will show that only one value of s need 
be considered. Note that the linear stochastic order, like the multivariate 
stochastic order, is a generalization of the usual univariate stochastic order 
to multivariate data. Both of these orders indicate, in different ways, that 
one random vector is more likely than another to take on large values. In this 
paper we develop the statistical theory and methodology for estimation and 
testing for linearly ordered multivariate distributions. For completeness we 
note that weaker notions of the linear stochastic order are discussed by Hu, 
Homem-de Mello and Mehrotra (2011) and applied to various optimization 
problems in queuing and finance. 

Comparing linear combinations has a long history in statistics. For exam- 
ple, in Phase I clinical trials it is common to compare dose groups using an 
overall measure of toxicity. Typically, this quantity is an ad hoc weighted av- 
erage of individual toxicities where the weights are often known as "severity 
weights;" cf. Bekele and Thall (2004) and Ivanova and Murphy (2009). This 
strategy of dimension reduction is not new in the statistical literature and 
has been used in classical multivariate analysis when comparing two or more 
multivariate normal populations. For example, using the union-intersection 
principle, the comparison of multivariate normal populations can be reduced 
to the comparison of all possible linear combinations of their mean vectors. 
This approach is the basis of Roy's classical largest root test [Roy (1953), 
Johnson and Wichern (1998)]. Our proposed test may be viewed as nonpara- 
metric generalization of the classical normal theory method described above 
with the exception that we limit consideration only to nonnegative linear 
combinations (rather than all possible linear combinations) since our main 
focus is to make comparisons in terms of stochastic order. We emphasize 
that the linear stochastic order will allow us to address the much broader 
problem of directional ordering for multivariate ordered data, that is, to find 
the direction which best separates two ordered distributions. Based on our 
survey of the literature, we are not aware of any methodology that addresses 
the problems investigated here. 

This paper is organized in the following way. In Section 2 some proba- 
bilistic properties of the linear stochastic order are explored, and its rela- 
tionships with other multivariate stochastic orders are clarified. In Section 3 
we provide the background and motivation for directional inference under 
the linear stochastic order and develop estimation and testing procedure for 
independent as well as paired samples. In particular the estimator of the 
best separating direction is presented and its large sampling properties de- 
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rived. We note that the problem of estimating the best separating direction 
is a nonsmooth optimization problem. The limiting distribution of the best 
separating direction is derived in a variety of settings. Tests for the linear 
stochastic order based on the best separating direction are also developed. 
One advantage of our approach is that it avoids the estimation of multivari- 
ate distributions subject to order restrictions. Simulation results, presented 
in Section 4, reveal that for large sample sizes the proposed estimator has 
negligible bias and mean squared error (MSE). The bias and MSE seem to 
depend on the true value of the best separating direction, the dependence 
structure and the dimension of the problem. Furthermore, the proposed test 
honors the nominal type I error rate and has sufficient power. In Section 5 we 
illustrate the methodology using data obtained from the National Toxicol- 
ogy Program (NTP). Concluding remarks and some open research problems 
are provided in Section 6. For convenience all proofs are provided in the 
Appendix where additional concepts are defined when needed. 

2. Some properties of the linear stochastic order. We start by clarify- 
ing the relationship between the linear stochastic order and the multivari- 
ate stochastic order. First note that X ^j. s t Y if and only if P(s T X > t) < 
P(s T Y > t) for all (t, s)eRxl p + which is equivalent to P(X G H) < P(Y £ 
H ) for all H £ % where % is the collection of all upper half-planes, that is, 
sets which are both half planes and upper sets. Thus X ^ st Y =^ X ^i- s t Y. 
The converse does not hold in general. 

Example 2.1. Let X and Y be bivariate RVs such that P(X = (1, 1)) = 
P(X = (0, 1)) = P(X = (1, 0)) = 1/3 and P(Y = (3/4, 3/4)) = P(Y = (1, 2)) = 
P(Y = (2, 1)) = 1/3. It is easy to show that X is smaller than Y in the linear 
stochastic order but not in the multivariate stochastic order. 

The following theorem provides some closure results for the linear stochas- 
tic order. 

Theorem 2.1. (i) //X -<ist Y, then g(X.) ^. st g(Y) for any affine in- 
creasing function; (ii) if X dil-et Y, then X/ ^- s t Y/ for each subset I £ 
{1, . . . ,p}; (iii) if X|Z = z ^- s t Y|Z = z for all z in the support of Z, then 
X ^ist Y; (iv) j/Xi,...,X n are independent RVs with dimensions pi and 
similarly for Yi , . . . , Y n and if in addition Xj ^/- s t Yj, then (Xi, . . . ,X n ) ^- s t 
(Yi, . . . , Y n ); (v) finally, if X n — > X andY n — > Y where convergence can 
be in distribution, in probability or almost surely and if X n ^- s t Y n for all 
n, then X ^/- s t Y. 

Theorem 2.1 shows that the linear stochastic order is closed under in- 
creasing linear transformations, marginalization, mixtures, conjugations and 
convergence. In particular parts (ii) and (iii) of Theorem 2.1 imply that if 
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X ;^- s t Y, then X{ Yi and Xi + Xj ^ st Yi + Yj for all i and j; that is, 
all marginals are ordered as are all convolutions. Although the multivariate 
stochastic order is in general stronger than the linear stochastic order, there 
are situation in which both orders coincide. 

Theorem 2.2. Let X and Y be continuous elliptically distributed RVs 
supported on W with the same generator. Then X ^/- s t Y if and only if 
X ± at Y. 

Note that the elliptical family of distributions is large and includes the 
multivariate normal, multivariate t and the exponential power family; see 
Fang, Kots and Ng (1989). Thus Theorem 2.2 shows that the multivariate 
stochastic order coincides with the linear stochastic order in the normal 
family. Incidentally, in the proof of Theorem 2.2 we generalize the results of 
Ding and Zhang (2004) on multivariate stochastic ordering of elliptical RVs. 
Another interesting example is the following: 

Theorem 2.3. Let X and Y be multivariate binary RVs. Then X :^- s t 
Y is equivalent to X ^ st Y if and only if p< 3. 

Remark 2.1. In the proof of Theorem 2.2 distributional properties of 
the elliptical family play a major role. In contrast, Theorem 2.3 is a conse- 
quence of the geometry of the upper sets of multivariate binary RVs which 
turn out to be upper half planes if and only if p < 3. 

We now explore the role of the dependence structure. 

Theorem 2.4. Let X and Y have the same copula. Then X ^;- s t Y if 
and only if X ^ s t Y. 

Theorem 2.4 establishes that if two RVs have the same dependence struc- 
ture as quantified by their copula function [cf. Joe (1997)], then the linear 
and multivariate stochastic orders coincide. Such situations arise when the 
correlation structure among outcomes is not expected to vary with dose. 

The orthant orders are also of interest in statistical applications. We say 
that X is smaller than Y in the upper orthant order, denoted X ^ uo Y, 
if P(X eO)< P(Y G O) for all O G O where O is the collection of upper 
orthants, that is, sets of the form {z : z > x} for some fixed x S MP. The lower 
orthant order is similarly defined; cf. Shaked and Shanthikumar (2007) or 
Davidov and Herman (2011). It is obvious that the orthant orders are weaker 
than the usual multivariate stochastic order, that is, X ^ st Y => X ^ uo Y 
and X -<\ Q Y. In general the linear stochastic order does not imply the upper 
(or lower) orthant order, nor is the converse true. However, as stated below, 
under some conditions on the copula functions, the linear stochastic order 
implies the upper (or lower) orthant order. 
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Theorem 2.5. //X ±. 8t Y and C x (u) < CV(u) for all u <E [0, If, then 
X Y. Similarly i/X ^;- s t Y and Cx(u) < CV(u) /or a// u E [0, l] p , i/ien 
X^ uo Y. 

Note that Cx(u) and Cx(u) above are the copula and tail-copula func- 
tions for the RV X [cf. Joe (1997)] and are defined in the Appendix and sim- 
ilarly for Cy(u) and CV( U )- Further note that the relations Cx(u) < CV(u) 
and/or Cx(u) < CV(u) indicate that the components of Y are more strongly- 
dependent than the components of X. This particular dependence ordering 
is known as positive quadrant dependence. It can be further shown that 
strong dependence and the linear stochastic order do not in general imply 
stochastic ordering. 

Additional properties of the linear stochastic order as they relate to esti- 
mation and testing problems are given in Section 3. 

3. Directional inference. 

3.1. Background and motivation. There exists a long history of well- 
developed theory for comparing two or more multivariate normal (MVN) 
populations. Methods for assessing whether there are any differences be- 
tween the populations [which differ? in which component(s)? and by how 
much?] have been addressed in the literature using a variety of simultane- 
ous confidence intervals and multiple comparison methods; cf. Johnson and 
Wichern (1998). Of particular interest to us is Roy's largest root test. To 
fix ideas consider two multivariate normal random vectors X and Y with 
means /i and is, respectively, and a common variance matrix S. Using the 
union-intersection principle Roy (1953) expressed the problem of testing 
Hq : fx = is versus Hi : fi ^ is as a collection of univariate testing problems, 
by showing that Hq and Hi are equivalent to f] s eiRp -^o,s and U s gRp ^i,s 
where Hq :S : s T fj, = s T u and Hi^ s : s T /x ^ s T u. Implicitly Roy's test identifies 
the linear combination s^^u — fi) that corresponds to the largest "dis- 
tance" between the mean vectors, that is, the direction which best separates 
their distributions. The resulting test, known as Roy's largest root test, is 
given by the largest eigenvalue of BS" 1 where B is the matrix of between 
groups (or populations) sums of squares and cross products, and S is the 
usual unbiased estimator of S. In the special case when there are only two 
populations, this test statistic is identical to Hotelling's T 2 statistic. From 
the simultaneous confidence intervals point of view, the critical values de- 
rived from the null distribution of this statistic can be used for constructing 
Scheffe's simultaneous confidence intervals for all possible linear combina- 
tions of the difference (ft — is). Further note that the estimated direction 
corresponding to Roy's largest root test is S~ 1 (Y — X) where X and Y are 
the respective sample means. 
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Our objective is to extend and generalize the classical multivariate method, 
described above, to nonnormal multivariate ordered data. Our approach 
will be nonparametric. Recall that comparing MVNs is done by consider- 
ing the family of statistics T n ^ m {s) = s T (Y — X) for all s £ W. In the case 
of nonnormal populations, the population mean alone may not be enough 
to characterize the distribution. In such cases, it may not be sufficient to 
compare the means of the populations but one may have to compare entire 
distributions. One possible way of doing so is by considering rank statistics. 
Suppose Xi, . . . ,X n and Yi, . . . , Y m are independent random samples from 
two multivariate populations. Let 

n m 

Rk(s) = X) I (s T X i <s T X fe ) + XX^Y^X*) 
i=l j=l 

be the rank of s T Xfc in the combined sample s r Xi, . . . , s T ~K n , s^Yi, . . . , s T Y„ 
For fixed s € MP the distributions of s T X and s T Y can be compared using 
a rank test. For example, if we use W n ^ m {s) = Y17=i Ri( s ) our comparison is 
done in terms of Wilcoxon's rank sum statistics. It is well known that rank 
tests are well suited for testing for univariate stochastic order [cf. Hajek, 
Sidak and Sen (1999), Davidov (2012)] where the restrictions that s s5^~ 
must be made. Although any rank test can be used, the Mann- Whitney form 
of Wilcooxon's (WMW) statistic is particularly attractive in this applica- 
tion. Therefore in the rest of this paper we develop estimation and testing 
procedures for the linear stochastic order based on the family of statistics 

^ n m 

*«>-( s ) = — EEVx,< s ^). 
i=i j=i 

where s varies over . Note that (3.1) unbiasedly estimates 
(3.2) *(s) =P(s T X<s T Y). 

The following result is somewhat surprising. 

Proposition 3.1. Let X and Y be independent MVNs with means [i < 
v and common variance matrix S. Then Roy's maximal separating direction 
S _1 (i/ — /x) also maximizes P(s T X < s T Y). 

Proposition 3.1 shows that the direction which separates the means, in 
the sense of Roy, also maximizes (3.2). Thus it provides further support for 
choosing (3.1) as our test statistic. Note that in general XI -1 [y — /x) may not 
belong to S 1 ! . Since we focus on the linear statistical order, we restrict our- 
selves to s E S^T ■ Consequently wg define s max ; — argmax gS p-i ^(s) and 
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refer to s max as the best separating direction. Further note that if X and 
Y are independent and continuous and if X i^-st Y, then ^(s) > 1/2 for all 
s £ S r ^T ■ This simply means that s r X tends to be smaller than s T Y more 
than 50% of the time. Note that probabilities of type (3.2) were introduced 
by Pitman (1937) and further studied by Peddada (1985) for comparing esti- 
mators. Random variables satisfying such a condition are said to be ordered 
by the precedence order [Arcones, Kvam and Samaniego (2002)]. 

Once s max is estimated we can plug it into (3.1) to get a test statistic. 
Hence our test may be viewed as a natural generalization of Roy's largest 
root test from MVNs to arbitrary ordered distributions. However, unlike 
Roy's method, which does not explicitly estimate s max , we do. On the other 
hand the proposed test does not require the computation of the inverse of 
the sample covariance matrix whereas Roy's test and Hotteling's T 2 test 
require such computations. Consequently, such tests cannot be used when 
n < p whereas our test can be used in all such instances. 

Remark 3.1. In the above description Xj and Yj are independent for 
all i and j and therefore the probability P(s T Xj < s T Yj) is independent of 
both i and j. However, in many applications such as repeated measurement 
and crossover designs, the data are a random sample of dependent pairs 
(Xi, Yi), . . . , (Xtv, Yjv) for which Zj = Yj — Xj are i.i.d. For example, such 
a situation may arise when Yj = Y^ + ej and Xj = X^ + ej, where ej are 
pair-specific random effects and the RVs Y' { (as well as X-) are i.i.d. In 
this situation P(s T Xj < s r Yj) is independent of i and s max is well defined. 
Moreover the objective function analogous to (3.1) is 

1 N 

(3-3) *Ar(s) = ^^V Xi < s T Yi) . 

i=i 

In the following we consider both sampling designs which we refer to as: 
(a) independent samples and (b) paired or dependent samples. Results are 
developed primarily for independent samples, but modification for paired 
samples are mentioned as appropriate. 

3.2. Estimating the best separating direction. Consider first the case of 
independent samples, that is, Xi, . . . , X n and Yi, . . . , Y m are random sam- 
ples from the two populations. Rewrite (3.1) as 

^ 71 m 

( 3 - 4 ) *n, m (s) = — EEVZ^O), 

i=l j=l 

where Zjj = Yj — Xj. The maximizer of (3.4) is denoted by s max , that is, 

(3.5) s max = arg max ^ n , m (s). 

eeST 
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Finding (3.5) with s G is a nonsmooth optimization problem. Consider 
first the situation where p = 2. In this case we maximize (3.4) subject to 
s G 5+ = {(si, S2) : s\ + s| = 1, (si, S2) > 0}. Geometrically 5+ is a quarter 
circle spanning the first quadrant. Now let Z = (Z\,Z2), and without any 
loss of generality assume that ||Z|| = 1. We examine the behavior of the 
function I( s t z >o) as a function of s. Clearly if Z > 0, that is, if Z\ > 0, Z2 > 0, 
then for all s G 5j. we have I( s tz> ) = 1. In other words any value of s on 
the arc iSj. maximizes I( s tz> ). Similarly if Z < then for all s G we 
have I( s tz> ) = an< ^ again the entire arc 5+ maximizes I( s tz> ). Now let 
Z\>0 and Z2 < 0. It follows that I( s rz>o) = 1 provided cos(s T Z) > 0. Thus 
I(s T z>o) = 1 f° r au s on the arc [0, 6] for some 0. If Z\ < and let Z2>0 the 
situation is reversed and I( s tz>o) = 1 f° r au angles s on the arc [0,7r/2]. The 
value of 6 is given by (3.6). In other words each Z is mapped to an arc on 
<Sl as described above. Now, the function (3.4) simply counts the number of 
arcs covering each s G S+. The maximizer of (3.4) lies in the region where the 
maximum number of arcs overlap. Clearly this implies that the maximizer 
of (3.4) is not unique. A quick way to find the maximizer is the following: 

Algorithm 3.1. Let M denote the number of Zy's which belong to 
the second or fourth quadrant. Map 

/o R \ 7 _ Jvr/2-cos~ 1 (Z iji i), if Z ijtl > 0, Z ijj2 < 0, 

1 j ij ij ~ \ cos-H^i) - vr/2, if Z ijt 1 < 0, Z lh 2 > 0. 

Relabel and order the resulting angles as 0m < • • • < 6*[ M ] . Also define 0r o i = 
and 9\m+i] = 7r /2- Evaluate ^ nj . m (s[j]) i = 1, . . . , M where Sm 1 = cos(0m) and 
Sfji2 = sin(0u). If a maximum is attained at 9u-\, then any value in [^[j-i]) ^[j]] 
or [0\j], maximizes (3.1). 

In light of the above discussion we can be easily prove the following: 

PROPOSITION 3.2. Forp = 2 Algorithm 3.1 maximizes (3.1). 

In the general case, that is, for p > 3, each observation Zjj is associated 
with a "slice" of S^T 1 . The boundaries of the slice are the intersection of 
1 and some half-plane. Note that when p = 2 the slices are arcs. The 
shape of the slice depends on the quadrant to which Zjj belongs. The maxi- 
mizer of (3.1) is again the value of s which belongs to the largest number of 
slices. Although the geometry of the resulting optimization problem is easy 
to understand, we have not been able to devise a simple algorithm, which 
scales with p, based on the ideas above. However, we have found that (3.5) 
can be obtained by converting the data into polar coordinates and then using 
the Nelder-Mead algorithm which does not require the objective function 
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to be differentiable. We emphasize that this maximization process results 
in a single maximizer of (3.1) and we do not attempt to find the entire set 
of maximizers. For completness we note that there are methods for opti- 
mizing (3.1) specifically designed for nonsmooth problems. For more details 
see Price, Reale and Robertson (2008) and Audet, Bechard and Le Diga- 
bel (2008) and the references therein for both algorithms and convergence 
results. 

Remark 3.2. It is clear that the estimation procedure for paired sam- 
ples is the same as for independent samples. 

3.3. Large sample behavior. We find three different asymptotic regimes 
for s max depending on distributional assumptions and the sampling scheme 
(paired versus independent samples). 

Note that the parameter space is the unit sphere not the usual Euclidian 
space. There are several ways of dealing with this irregularity, one of which is 

to re-express the last coordinate of s G as s p = yjl — s\ — ■ ■ ■ — s^_ 1 and 

consider the parameter space {s > : si H — • + s p _ 1 < 1} which is a compact 
subset of M p_1 . Clearly these parameterizations are equivalent, and without 
any further ambiguity we will denote them both by Thus in the proofs 

below both views of are used interchangeably as convenient. 

We begin our discussion with independent samples assuming continuous 
distributions for both X and Y. 

Theorem 3.1. Let X and Y have continuously differentiable densities. 
If^(s) is uniquely maximized by s max G interior (S^T 1 ). Then s max , the max- 
imizer of (3.1 ), is strongly consistent, that is, s max — y s max . Furthermore 
Smax = s max + O p (N~ 1 ^ 2 ) where N = n + m. Finally, if n/(n + m) — > A € 
(0,1), then 

iV 1 / 3 (s nu «-s nu «)=^JV(0,E), 
where the matrix 5] is defined in the body of the proof. 

Although (3.1) is not continuous (nor differentiable) its [/-statistic struc- 
ture guarantees that it is "almost" so [i.e., it is continuous up to an o p (l/N) 
term] , and therefore its maximizer converges at a y/~N rate to a normal limit 
[Sherman (1993)]. We also note that it is difficult to estimate the asymptotic 
variance S directly since it depends on unknown functions (Vtpj and V 2 ^' 
for j = 1,2 are defined in the body of the proof). Nevertheless bootstrap 
variance estimates are easily derived. 



Remark 3.3. Note that if either X or Y are continuous RVs, then ^(s) 
is continuous. This is a necessary condition for the uniqueness of s max . 



INFERENCE FOR THE LINEAR STOCHASTIC ORDER 



11 



We have not been able to find general condition(s) for a unique maximizer 
for ^(s), although important sufficient conditions can be found. For example: 

Proposition 3.3. If Z = Y — X and there exist 8 = v — fx > and £ 

so the distribution of 

s T Z - s T S 

is independent of s, then the maximizer offy(s) is unique. 

The condition above is satisfied by location scale families, and it may be 
convenient to think of S and 5] as the location and scale parameters for Z. In 
general, however, \&(s) may not have a unique maximum nor be continuous. 
For example, if both X and Y are discrete RVs, then \l/(s) is a step function. 
In such situations s max is set valued, and we may denote it by S max . As we 
have seen earlier the maximizer of vEVi,m( s ) is always set valued (typically, 
however, we find only one maximizer). Consider, for example, the case where 
P(X = (-1, -1)) = P(X = (1, 1)) = 1/2 and let P(Y = (-1,-1)) = 1/2 - e, 
and P(Y = (1, 1)) = 1/2 + e for some e > 0. It is clear that X -< st Y. Further 
note that Z 6 {(2,2), (0,0), (-2, -2)}, and it follows that *(s) is constant on 
<S+ which implies that S max coincides with 5^J_. Similarly ^ n ^ m (s) is constant 
on S\_ and therefore s max G S max for all n, m. This means that consistency is 
guaranteed and the limiting distribution is degenerate. More generally, we 
have: 

Theorem 3.2. //X and Y have discrete distributions with finite sup- 
port and s max is a maximizer of (3.1), then 

P(s max i S max ) < d exp(-C 2 iV) 

for some positive constants C\ and C^. 

Theorem 3.2 shows that the probability that a maximizer of ^ n ,m is not 
in S max is exponentially small when the underlying distributions of X and 

Y are discrete. Hence s max is consistent and converges exponentially fast. 
In fact the proof of Theorem 3.2 implies that S max — > S max ; that is, the set 
of maximizers of v & n ,m(') converges to the set of maximizers of \t'(-); that is, 
PH(Smax, S max ) — > where pu is the Hausdroff metric defined on compact 
sets. A careful reading of the proof shows that Theorem 3.2 also holds under 
paired samples. 

Finally, we consider the case of continuous RVs under paired samples. 
Then under the conditions of Theorem 3.1 and provided the density of Z = 

Y — X is bounded we have: 

Theorem 3.3. Under the above mentioned conditions s max; the maxi- 
mizer of (3.3), is strongly consistent, that is, s max — >' s max; converges at a 
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cube root rate, that is, s max = s max + O p (N 1 ' 3 ), and 

N ^ (s ma x S max ) =^ W, 

where W has the distribution of the almost surely unique maximizer of the 
process s i — > -[Q(s)+W(s)] onSl' 1 whereQis) is a quadratic function and 
W(s) is a zero mean Gaussian process described in the body of the proof. 

Theorem 3.3 shows that in paired samples s max is consistent, but in con- 
trast with Theorem 3.1 it converges at a cube-root rate to a nonnormal 
limit. The cube root rate is due to the discontinuous nature of the objec- 
tive function (3.3). General results dealing with this kind of asymptotics for 
independent observations are given by Kim and Pollard (1990). The main 
difference between Theorems 3.1 and 3.3 is that the objective function (3.1) 
is smoothed by its [/-statistic structure while (3.3) is not. 

3.4. A confidence set for s max . Since the parameter space is the surface 
of a unit sphere it is natural to define the (1 — a) x 100% confidence set for 
Smax centered at s max 

by 

{s € 5+ 1 : s max s < C Qi tv}, 

where C Qj tv satisfies P(s„ iax s < C Qj tv) = 1 — a. For more details see Fisher 
and Hall (1989) or Peddada and Chang (1996). Hence the confidence set is 
the set of all s £ 1 which have a small angle with s max . In theory one may 
appeal to Theorem 3.1 to derive the critical value for any a G (0, 1). However 
the limit law in Theorem 3.1 requires knowledge of unknown parameters and 
functions. For this reason, we explore the bootstrap for estimating Ca,N • 

Remark 3.4. Since in the case of paired samples, the estimator con- 
verges at cube root rate rather than the square root rate, the standard boot- 
strap methodology may yield inaccurate coverage probabilities; see Abrevaya 
and Huang (2005) and Sen, Banerjee and Woodroofe (2010). For this reason 
we recommend the "M out of N" bootstrap methodology. For further dis- 
cussion on the "M out of N" bootstrap methodology one may refer to Lee 
(1999), Delagdo, Rodriguez-Poo and Wolf (2001), Bickel and Sakov (2008). 

3.5. Testing for order. Consider first the case of independent samples 
where interest is in testing the hypothesis 

(3.7) # :X= st Y versus #i:X^ st Y. 

Thus (3.7) tests whether the distributions of X and Y are equal or ordered 
(later on we briefly discuss testing Hq :X ^ st Y versus Hi :X 7^ st Y). In 
this section we propose a new test for detecting an ordering among two 
multivariate distributions based on the maximal separating direction. The 
test is based on the following observation: 
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Theorem 3.4. Let X and Y be independent and continuous RVs. If 
X = at Y, then P(s T X < s T Y) = 1/2 for all s G S£~\ and if both (i) X r< st Y 

and (ii) P(s T X < s T Y) > 1/2 for some s G S^ 1 hold, then X -< st Y. 

Theorem 3.4 says that if it is known a priori that {X ^ st Y} = {X = st 
Y} U {X -< st Y}, that is, the RVs are either equal or ordered [which is exactly 
what (3.7) implies], then a strict linear stochastic ordering implies a strict 
ordering by the usual multivariate stochastic order. In particular under the 
alternative there must exist a direction s G S^T 1 for which s T X -</- s t s T Y. 

Remark 3.5. The assumption that X ^ st Y is natural in applications 
such as environmental sciences where high exposures are associated with 
increased risk. Nevertheless if the assumption that X ^ st Y is not warranted 
then the alternative hypothesis formulated in terms of the linear stochastic 
order actually tests whether there exists a s G <S+ 1 for which P(s T X < 
s T Y) > 1/2. This amounts to a precedence (or Pitman) ordering between 
s T X and s T Y. 

Remark 3.6. In the proof of Theorem 3.4 we use the fact that given 
that X ^ st Y we have X -< st Y provided Xi -< st Yj for some 1 < i < p. Note 
that if Xi -< st Yi, then E(X») < E(Yj). Thus it is possible to test (3.7) by 
comparing means (or any other monotone function of the data). Although 
such a test will be consistent it may lack power because tests based on 
means are often far from optimal when the data is not normally distributed. 
The WMW procedure, however, is known to have high power for a broad 
collection of underlying distributions. 

Hence (3.7) can be reformulated in terms of the linear stochastic. In par- 
ticular it justifies using the statistic 

(3.8) S n , m = iV 1/2 (^„, m (s max ) - 1/2). 

To the best of our knowledge this is the first general test for multivariate 
ordered distributions. In practice we first estimate s max and then define 
Ui = Cax-^i an d Vj = s^ ax Yj where i = 1, . . . , n and j = 1, . . . , m. Hence 
(3.8) is nothing but a WMW test based on the U"s and V's. It is also a 
Kolmogorov-Smirnov type test. 

The large sample distribution of (3.8) is given in the following. 

Theorem 3.5. Suppose the null (3.7) holds. Letn,m— > oo andn/{n + 
m) — > A G (0,1). Then 

S n .m =>S= sup G(s), 

where <G(s) is a zero mean Gaussian process with covariance function given 
by (A. 20). 
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Remark 3.7. Since s max -4' s max by Slutzky's theorem the power of 
test (3.8) converges to the power of a WMW test comparing the samples 

(sL x i.-"i s max X n) and (sL Y i , ■ ■ ■ > s L x Y m) ■ The "synthetic" test, as- 
suming that s max is known, serves as a gold standard as verified by our 
simulation study. 

Remark 3.8. Furthermore, the power of the test under local alterna- 
tives, that is, when Y = st X + N~ l l 2 8 and ./V — > oo is bounded by the 
power of the WMW test comparing the distributions of s^ ax X and s^ ax Y = 

Alternatives to the "sup" statistic (3.8) are the "integrated" statistics 

In,m = [ [iV 1/2 (^„, m (s) - 1/2)] ds and 
./sesr 1 

(3.9) 

l£m= [ [iV 1/2 (^n, m (s)-l/2)] + ds, 

where [x] + = max(0, x). It is clear that I n ,m N(0, a 2 ) where 
a 2 = I I C(u,v)dudv 



and C(u, v), the covariance function of G, is given by (A. 20). Also 

I+ m ^ [ [G(s)] + ds. 

This distribution does not have a closed form. The statistics l n ^ m and I^m 
have univariate analogues; cf. Davidov and Herman (2012). Finally, we have 
the following theorem: 

Theorem 3.6. The tests (3.8) and (3.9) are consistent. Furthermore if 
X ;^- s t Y ;^z- s t Z, then all three tests for Hq : X = st Z versus Hi : X -< st Z are 
more powerful than the respective tests for Hq : X = s t Y versus Hi : X -< st Y. 

Theorem 3.6 shows that the tests are consistent and that their power 
function is "monotone" in the linear stochastic order. 

Remark 3.9. Qualitatively similar results are obtainable in the paired 
sampling case; the only difference being the limiting process. For example, 
it easy to see that the paired sample analogue of (3.8) satisfies 



sup 
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where Q(s) is the empirical process on S^T associated with (3.3). Analogues 
of I n ,m an d Inm are similarly defined and analyzed. Tests for paired samples 
may be similarly implemented using bootstrap or permutation methods. 

4. Simulations. For simplicity of exposition, and motivated by the fact 
that the example we analyzed in this paper deals with independent samples, 
we limit our simulations to the case of independent samples. 

4.1. The distribution o/s max . We start by investigating the distribution 
of s max by simulation. For simplicity we choose p — 3 and generated Xj 
(i = 1, . . . , n) distributed as AT 3 (0, E) and Yj (j = 1, . . . , m) distributed as 
Ns(S, 5]) where 5] = (1 — p)l + pj, I is the identity matrix and J is a square 
matrix of Is. We simulated 1000 realizations of s max for various sample 
sizes and correlation coefficients. To get a visual description of the density 
of 

Smax) we provide a pair of plots for each configuration of p and sample 
size n. In Figure 1 we provide the joint density of the two-dimensional polar 
angles (6,cp) of s max . There are four panels in Figure 1, corresponding to all 
combinations of p = 0, 0.9 and n = 10, 100. The mean vector 8 in this plot 
was taken to be 8 = (2, 2, 2) T . In Figure 2 we provide the density of the polar 
residual defined by 1 — s^ ax s max . The four panels of Figure 2 correspond 
to all combinations of p = 0,0.9 and n = 10,100 and two patterns of 8, 
namely, (2,2,2) T and (3,2,1) T . We see from Figure 1 that s max converges 
to a unimodal, normal looking distribution as the sample size increases. 
Interestingly, from Figure 2 we see that the concentration of the distribution 
around the true parameter depends upon the values of 8 and p (which 
together determine s max ). If the components of the underlying random vector 
are exchangeable [e.g., 8 = (2, 2, 2) T ], the residuals tend to concentrate more 
closely around zero [Figure 2(a) and (c)] compared to the case when they 
are not exchangeable [Figure 2(b) and (d)]. 

4.2. Study design. The simulation study consists of three parts. In the 
first part we evaluate the accuracy and precision of s max by estimating its 
bias and mean squared error (MSE). In the second part we investigate the 
coverage probability of bootstrap confidence intervals. In the third part we 
estimate type I errors and powers of the proposed test S njm as well as the 
integral tests I n ^ m and I^ m . 

To evaluate the bias and MSEs we generated Xi, . . . , X n ~ N$(0, S) and 
Yi, . . . , Y m ~ Ns(8, SI) where n = m = 20 or 100 observations. The common 
variance matrix is assumed to have intra-class correlation structure, that is, 
S = (1 — p)l + pj where I is the identity matrix and J is a matrix of ones. 
Various patterns of the mean vectors 8 and correlation coefficient p were 
considered as described in Table 1. 




Fig. 1. Plot of (simulated) polar angles. 
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Table 1 
Bias and MSE of s n 



Bias 



MSE 



(1,1,1) 
(1,1,1) 
(1,1,1) 
(1,1,1) 
(1,1,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 

(1,1,1) 
(1,1,1) 
(1,1,1) 
(1,1,1) 
(1,1,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 
(3,2,1) 



n = m = 20 



-0.25 


0.25 
0.50 
0.90 

-0.25 


0.25 
0.50 
0.90 



-0.25 


0.25 
0.50 
0.90 

-0.25 


0.25 
0.50 
0.90 



0.001 
0.004 
0.009 
0.012 
0.010 

0.018 
0.001 
0.053 
0.060 
0.112 



n = m = 100 



0.00009 
0.00021 
0.00045 
0.00079 
0.00050 

0.02400 
0.00004 
0.05200 
0.06400 
0.14100 



0.072 
0.129 
0.187 
0.216 
0.203 

0.090 
0.066 
0.114 
0.113 
0.170 



0.014 
0.027 
0.041 
0.056 
0.044 

0.039 
0.012 
0.065 
0.077 
0.158 



We conducted extensive simulation studies to evaluate the performance of 
the bootstrap confidence intervals. In this paper we present a small sample 
of our study. We generated data from two 5-dimensional normal populations 
with means and S, respectively, and a common covariance S = (1 — p)I + 
pJ . We considered 5 patterns of p and 2 patterns of sample sizes (n = m = 20 
and 40). The nominal coverage probability was 0.95. Results are summarized 
in Table 2. 

The goal of the third part of our simulation study is to evaluate the 
type I error and the power of the test (3.8). To evaluate the type I error 
three different baseline distributions for the two populations X and Y were 
employed as follows: (1) both distributed as N(0, X);(2) both distributed as 
vriV(0, £) + (1 - n)N{6, S) with vr = 0.2 or vr = 0.8; and (3) both distributed 
as exp(Z) = (exp(Zi), . . . ,exp(Z p )) where Z follows a iV(<5,X). We refer 
to this distribution as the multivariate lognormal distribution. Throughout 
the variance matrix is assumed to have the intra-class structure described 
above. Various patterns of the mean vectors S and correlation coefficient p 
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Table 2 

Coverage probabilities for the bootstrap confidence intervals for 
p — 5 normal data. Pattern i — 1,2 corresponds to 
di = (0.1,0.25,0.5,0.75,0.9) and S 2 = (0.5,0.5,0.5,0.5,0.5) 



Set up Coverage probability 



Pattern 


P 


n = rn = 20 


n = m = 40 


1 


-0.25 


0.981 


0.971 


1 





0.913 


0.918 


1 


0.25 


0.916 


0.933 


1 


0.50 


0.971 


0.969 


1 


0.90 


0.993 


0.989 


2 


-0.25 


0.982 


0.967 


2 





0.984 


0.972 


2 


0.25 


0.986 


0.978 


2 


0.50 


0.968 


0.968 


2 


0.90 


0.950 


0.954 



the dimension p were considered as described in Table 3. Sample sizes of 
n = m = 15 or 25 are reported. 

Power comparisons were carried out for data generated from Xi, . . . , X„ ~ 
iVp(0,S) and Yi, . . . , Y m ~ N P (S, 52) where p = 3 or 5 and a variety of 
patterns for d as described in Table 4. If Roy's maximal separating direction 
(cf. Proposition 3.1) was known then a "natural gold standard" would be 
the test based on $n im (s max ). We shall refer to this test as the true maximal 
direction (TMD) test. Clearly the TMD test cannot be used in practice since 
it involves the unknown direction s max . Nevertheless the TMD test provides 
an upper bound for the power of the proposed test which uses the estimated 
direction. Hence we compute the efficiency of the proposed test relative to 
TMD test. An additional test, referred to as the RMD test is also compared. 
The RMD test has the same form but uses Roy's maximal direction given 
by S _1 (Y — X). As suggested by a reviewer we also evaluated the power 
of the two integral based tests, described in (3.9), which do not require the 
determination of the best separating direction. 

Additionally, in Table 5 we evaluate the type I error and power of our 
test when Xi, . . . ,X n ~ N p (0, 5]) and Yi, . . . , Y m ~ N P (S, £) and n = m = 
p = 10 and n = m = 10 and p = 20 (i.e., p <n set up). Note that in neither 
of these cases the standard Hotteling's T 2 (or Roy's largest root test) can 
be computed whereas the proposed test can be calculated. 

Simulation results reported in this paper are based on 1000 simulation 
runs. Confidence sets are calculated using 1000 bootstrap samples. The boot- 
strap critical values for estimating type I error were based on 500 bootstrap 
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Table 3 

Type I errors for the proposed procedure with 
nominal level a = 0.05. Three types of distributions are considered: 
MVNs, MV-LogN (multivariate lognormal) and Mix-MVN 
(mixtures of MVNs) 



Set 


up 




Type I 


error 


Distribution 


P 


P 


n = m = 15 


n = rn = 25 


MVNs 


3 


—0.25 


0.041 


0.037 


MVNs 


3 


0.00 


0.023 


0.044 


MVNs 


3 


0.25 


0.037 


0.033 


MVNs 


3 


0.50 


0.027 


0.032 


MVNs 


3 


0.90 


0.031 


0.036 


MVNs 


5 


—0.25 


0.035 


0.035 


MVNs 

IV J. V 1 M □ 


5 


0.00 


0.040 


0.041 


MVNs 


5 


0.25 


0.045 


0.032 


MVNs 


5 


0.50 


0.038 


0.043 


MVNs 


5 


0.90 


0.044 


0.031 


MV-LogN 


■'i 


—0.25 


0.025 


0.040 


MV-LogN 


3 


0.00 


0.038 


0.049 


MV-LogN 


3 


0.25 


0.025 


0.027 


MV-LogN 


3 


0.50 


0.028 


0.037 


MV-LogN 


3 


0.90 


0.026 


0.034 


MV-LogN 


5 


-0.25 


0.026 


0.039 


MV-LogN 


5 


0.00 


0.035 


0.018 


MV-LogN 


5 


0.25 


0.039 


0.039 


MV-LogN 


5 


0.50 


0.036 


0.046 


MV-LogN 


5 


0.90 


0.034 


0.042 


Mix-MVNs 


3 


-0.25 


0.032 


0.040 


Mix-MVNs 


3 


0.00 


0.038 


0.028 


Mix-MVNs 


3 


0.25 


0.039 


0.032 


Mix-MVNs 


3 


0.50 


0.036 


0.035 


Mix-MVNs 


3 


0.90 


0.041 


0.028 


Mix-MVNs 


5 


-0.25 


0.042 


0.035 


Mix-MVNs 


5 


0.00 


0.040 


0.031 


Mix-MVNs 


5 


-0.25 


0.041 


0.028 


Mix-MVNs 


5 


0.50 


0.034 


0.040 


Mix-MVNs 


5 


0.90 


0.042 


0.036 



samples. Since the results between 100 bootstrap samples and 500 boot- 
strap samples did not differ by much, all powers were estimated using 100 
bootstrap samples. 

4.3. Simulation results. The Bias and MSEs for the patterns considered 
are summarized in Table 1. It is clear that the bias decreases with the sample 
size as do the MSEs. We observe that the bias tends to be smaller under 
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Table 4 

Type I errors and power for some settings with p>n. Here n = m — 10 and Si has 
components 1/2 and S 2 has components i/p 



Type I error and power p = 


10, n = m = l() 


Type I error and power p = 


20, n = m = 10 


S 


p 


Type I error 


S 


p 


Type I error 





0.00 


0.054 





0.00 


0.081 





0.25 


0.051 





0.25 


0.050 





0.50 


0.028 





0.50 


0.046 





0.90 


0.038 





0.90 


0.048 




Power 






Power 




Si 


0.00 


0.83 


<5i 


0.00 


0.97 


Si 


0.25 


0.48 


Si 


0.25 


0.53 


Si 


0.50 


0.26 


Si 


0.50 


0.42 


Si 


0.90 


0.20 


Si 


0.90 


0.22 


62 


0.00 


0.98 


62 


0.00 


0.98 


62 


0.25 


0.80 


S-2 


0.25 


0.59 


62 


0.50 


0.67 


62 


0.50 


0.43 


s 2 


0.90 


0.71 


s 2 


0.90 


0.40 



independence and negative dependence compared with positive dependence. 
It also tends to be smaller when the data are exchangeable. Although results 
are not presented, we evaluated squared bias and MSE for larger values of 
p (e.g., p = 5, 10 and 20) and as expected the total squared bias and total 
MSE increased with the dimension p. 

In Table 2 we summarize the estimated coverage probabilities of the boot- 
strap confidence intervals when p = 5. Our simulation study suggests that 
the proposed bootstrap methodology seems to perform better for larger sam- 
ple sizes but rather poorly for smaller samples sizes. 

Type I errors for different patterns considered in our simulation study are 
summarized in Table 3. Our simulation studies suggest that in every case 
the proposed bootstrap based test maintains the nominal level of 0.05. In 
general it is slightly conservative. The performance of the test is not affected 
by the shape of the underlying distribution. This is not surprising, owing to 
the nonparametric nature of the test. Furthermore, we evaluated the type 
I error of the proposed bootstrap test for testing the null hypothesis (3.7) 
for p as large as 20 with n = m = 10 and discovered that the proposed test 
attains the nominal level of 0.05 even n < p. See Table 4. As commented 
earlier in the paper, Hotelling's T 2 statistic cannot be applied here since the 
Wishart matrix is singular in this case. However, the proposed method is 
still applicable since the estimation of the best direction does not require 
the inversion of a matrix. 
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Table 5 

Power comparisons of the two proposed test procedures with type I error of 0. 050. Here 
6i = (0.1,0.5,0.9), 6 2 = (0.1,0.25,0.5,0.75,0.9), <5 3 = (0.5,0.5,0.5) and 
8 4 = (0.5,0.5,0.5,0.5,0.5) 



Power and RE % (ra = m = 15) 





Set 


up 






Directional tests 






Integ 


ral tests 




p 




p 


Sri , m 


RMD test 


TMD test 


I ri, . rn, 


1+ 


3 


Ol 


—0.25 


0.79 


(90%) 


0.62 


(71%) 


0.88 


0.89 


(100%) 


0.89 


(100%) 


3 


<5i 


0.00 


0.64 


(82%) 


0.45 


(57%) 


0.78 


0.68 


(87%) 


0.68 


(87%) 


3 


Si 


0.25 


0.53 


(78%) 


0.38 


(56%) 


0.68 


0.54 


(79%) 


0.54 


(79%) 


3 


5i 


0.50 


0.51 


(73%) 


0.41 


(59%) 


0.70 


0.47 


(67%) 


0.47 


(67%) 


3 


Si 


0.90 


0.62 


(64%) 


0.85 


(99%) 


0.97 


0.40 


(41%) 


0.41 


(42%) 


5 


s> 


-0.25 


0.93 


(95%) 


0.74 


(76%) 


0.98 


0.97 


(99%) 


0.97 


(99%) 


5 


s> 


0.00 


0.80 


(87%) 


0.56 


(60%) 


0.92 


0.86 


(93%) 


0.86 


(93%) 


5 


82 


0.25 


0.59 


(73%) 


0.39 


(47%) 


0.81 


0.66 


(81%) 


0.66 


(81%) 


5 


82 


0.50 


0.56 


(67%) 


0.42 


(50%) 


0.84 


0.48 


(57%) 


0.48 


(57%) 


5 


82 


0.90 


0.63 


(64%) 


0.88 


(89%) 


0.99 


0.40 


(40%) 


0.40 


(40%) 


3 


8, 


-0.25 


0.74 


(89%) 


0.54 


(64%) 


0.83 


0.83 


(100%) 


0.83 


(100%) 


3 


8, 


0.00 


0.56 


(87%) 


0.34 


(53%) 


0.64 


0.59 


(92%) 


0.59 


(92%) 


3 


8, 


0.25 


0.42 


(87%) 


0.23 


(48%) 


0.49 


0.46 


(93%) 


0.46 


(93%) 


3 


8, 


0.50 


0.33 


(86%) 


0.15 


(40%) 


0.38 


0.37 


(97%) 


0.37 


(97%) 


3 


8, 


0.90 


0.27 


(83%) 


0.12 


(38%) 


0.32 


0.27 


(83%) 


0.27 


(83%) 


5 


81 


-0.25 


0.92 


(95%) 


0.65 


(68%) 


0.96 


0.95 


(99%) 


0.95 


(99%) 


5 


84 


0.00 


0.75 


(90%) 


0.43 


(51%) 


0.83 


0.82 


(99%) 


0.82 


(99%) 


5 


84 


0.25 


0.49 


(87%) 


0.20 


(35%) 


0.57 


0.60 


(100%) 


0.60 


(100%) 


5 


d 4 


0.50 


0.41 


(90%) 


0.16 


(34%) 


0.45 


0.43 


(100%) 


0.43 


(100%) 


5 


84 


0.90 


0.29 


(92%) 


0.10 


(32%) 


0.31 


0.33 


(100%) 


0.33 


(100%) 



The power of tests (3.8) and (3.9) for various patterns considered in our 
simulation study are summarized in Table 5. 

As expected, in every case the power of the TMD test is higher than 
that of S n , m test and the RMD test. The S n ^ m test is almost always more 
powerful than the RMD test. The relative efficiency of S nm compared to 
the TMD test is quite high in most cases. When n = m = 15 the relative 
efficiency ranges between 65-95%. It is almost always above 90% when the 
sample size increases to 25 per group. In general the two integral tests had 
very similar power. They had larger power than S n m when p < 0. As p 
increased, the power of 5 njTn improved relative to the two integral tests. 
Test (3.8) seems to perform better when the components of S were unequal. 
We also note that when the integral tests outperform S nm the difference 
is usually small, whereas the S nm test can outperform the integral tests 
substantially. For example, observe pattern 2 where the powers of S n>m and 
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Table 5 
( Continued) 



Power and RE % (n = m = 25) 



Set up Directional tests Integral tests 



p 


5 


p 


s 


n,m 


RMD test 


TMD test 


I'r i . r ri. 


1+ 


6 


01 


— u.zo 


U.yD 


^o/o ) 


0.90 


(91%) 


U.yo 


0.98 


(100%) 


0.98 


(100%) 


Q 
O 


01 


o on 


u.oo 


a£ /0 ) 


0.72 


(78%) 


Q9 
u.yz 


0.85 


(92%) 


0.86 


(92%) 


3 


*1 


0.25 


0.80 


(88%) 


0.69 


(76%) 


0.90 


0.75 


(83%) 


0.75 


(83%) 


3 


*1 


0.50 


0.75 


(84%) 


0.67 


(75%) 


0.89 


0.66 


(74%) 


0.66 


(74%) 


3 


*1 


0.90 


0.89 


(89%) 


0.98 


(99%) 


1.00 


0.59 


(59%) 


0.61 


(61%) 


5 


<5> 


-0.25 


1.00 


(100%) 


0.98 


(98%) 


1.00 


1.00 


(100%) 


1.00 


(100%) 


5 




0.00 


0.96 


(97%) 


0.85 


(86%) 


0.99 


0.98 


(99%) 


0.98 


(99%) 


5 


s s 


0.25 


0.85 


(88%) 


0.74 


(76%) 


0.97 


0.83 


(86%) 


0.83 


(86%) 


5 


s. 


0.50 


0.81 


(84%) 


0.74 


(77%) 


0.96 


0.70 


(73%) 


0.70 


(73%) 


5 


s 2 


0.90 


0.90 


(90%) 


0.99 


(100%) 


1.00 


0.57 


(57%) 


0.58 


(58%) 


3 


s. 


-0.25 


0.94 


(96%) 


0.85 


(87%) 


0.98 


0.96 


(98%) 


0.96 


(98%) 


3 


s. 


0.00 


0.75 


(92%) 


0.57 


(69%) 


0.82 


0.79 


(96%) 


0.79 


(96%) 


3 


s. 


0.25 


0.62 


(89%) 


0.39 


(56%) 


0.70 


0.66 


(94%) 


0.66 


(94%) 


3 


s. 


0.50 


0.54 


(90%) 


0.31 


(52%) 


0.60 


0.55 


(92%) 


0.55 


(92%) 


3 


8, 


0.90 


0.44 


(90%) 


0.20 


(42%) 


0.49 


0.42 


(86%) 


0.42 


(86%) 


5 


64 


-0.25 


0.99 


(99%) 


0.94 


(94%) 


1.00 


1.00 


(100%) 


1.00 


(100%) 


5 


<54 


0.00 


0.94 


(96%) 


0.72 


(74%) 


0.97 


0.97 


(100%) 


0.97 


(100%) 


5 


<54 


0.25 


0.71 


(90%) 


0.41 


(52%) 


0.79 


0.79 


(100%) 


0.79 


(100%) 


5 


<54 


0.50 


0.58 


(91%) 


0.25 


(39%) 


0.63 


0.64 


(100%) 


0.64 


(100%) 


5 


54 


0.90 


0.42 


(87%) 


0.18 


(36%) 


0.49 


0.46 


(94%) 


0.46 


(94%) 



In,m are 0.93 and 0.97, respectively, when p = —0.25 and 0.63 versus 0.40 
when p = 0.90. 

5. Illustration. Prior to conducting a two-year rodent cancer bioassay to 
evaluate the toxicity/carcinogenicity of a chemical, the National Toxicology 
Program (NTP) routinely conducts a 90-day pre-chronic dose finding study. 
One of the goals of the 90-day study is to determine the maximum tolerated 
dose (MTD) that can be used in the two-year chronic exposure study. Ac- 
curate determination of the MTD is critical for the success of the two-year 
cancer bioassay. Cancer bioassays are typically very expensive and time con- 
suming. Therefore their proper design, that is, choosing the correct dosing 
levels, is very important. When the highest dose used in the two-year study 
exceeds the MTD, a large proportion of animals in the high dose group (s) 
may die well before the end of the study, and the data from such group (s) 
cannot be used reliably. This results in inefficiency and wasted resources. 
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Typically the NTP uses the 90-day study to determine the MTD on the 
basis of a large number of correlated endpoints that provide information 
regarding toxicity. These include body weight, organ weights, clinical chem- 
istry (red blood cell counts, cell volume, hemoglobin, hematocrit, lympho- 
cytes, etc.), histopathology (lesions in various target organs), number of 
deaths and so forth. The dose response data is analyzed for each variable 
separately using Dunnett's or the Williams's test (or their nonparametric 
versions, Dunn's test and Shirley's test, resp.). NTP combines results from 
all such analyses qualitatively and uses other biological and toxicological 
information when making decisions regarding the highest dose for the two- 
year cancer bioassay. Analyzing correlated variables one at a time may result 
in loss of information. The proposed methodology provides a convenient 
method to combine information from several outcome variables to make 
comparisons between groups. 

We now illustrate our methodology by re-analyzing data obtained from a 
recent NTP study of the chemical Citral [NTP (2003)]. Citral is a flavoring 
agent that is widely used in a variety of food items. The NTP assigned a ran- 
dom sample of 10 male rats to the control group and 10 to the 1785 mg/kg 
dose group. Hematological and clinical chemistry measurements such as the 
number of platelets (in 1000 per 1), urea nitrogen (UN) (in mg/dl), alkaline 
phosphatase (AP) (in IU/1) and bile acids (BA) (in mol/1) were recorded on 
each animal at the end of the study. The NTP performed univariate anal- 
ysis on each of these variables and found no significant difference between 
the control and dose group except for the concentration of urea nitrogen 
which was increased in the high dose group. This increase was marginally 
significant at the 5% level and not at all after correcting for multiplicity. We 
applied the proposed methodology to compare the control with the high-dose 
group (1785 mg/kg) in terms of all nonnegative linear combinations of the 
above mentioned four variables. We test the null hypothesis of no difference 
between the control and the high-dose group against the alternative that the 
high-dose group is stochastically larger (in the above four variables) than 
the control group. The resulting p-value based on 10,000 bootstrap samples 
was 0.025, which is significant at a 5% level of significance. The estimated 
value of s max was (0.074, 0.986, 0.012, 0.150) T and the estimated 95% confi- 
dence region is given by {s £ 1 : s^ ax s < 0.93}. Hence the confidence set 
includes any s which is within 21.5° degrees of s max . This is a relatively 
large set due to the small sample sizes. Clearly our methodology appears to 
be sensitive to detect statistical differences which were not noted by NTP. 
Furthermore, our methodology allows us to infer that indeed 1785 mg/kg 
dose group is larger in the multivariate stochastic order than the control 
group. This is a much stronger conclusion than the simple ordering of their 
means. Thus we believe that the proposed framework and methodology for 
studying ordered distributions can serve as a useful tool in toxicology and is 
also applicable to a wide range of other problems as alluded to in this paper. 
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6. Concluding remarks and some open problems. In many applications, 
researchers are interested in comparing two experimental conditions, for ex- 
ample, a treatment and a control group, in terms of a multivariate response. 
In classical multivariate analysis one addresses such problems by comparing 
the mean vectors using Hotelling's T 2 statistic. The assumption of MVN, 
underlying Hotelling's T 2 test, may not hold in practice. Moreover if the 
data is not MVN, then the comparison of population means may not al- 
ways provide complete information regarding the differences between the 
two experimental groups. Secondly, Hotelling's T 2 statistics are designed for 
two-sided alternatives and may not be ideal if a researcher is interested in 
one-sided, that is, ordered alternatives. Addressing such problems requires 
one to compare the two experimental groups nonparametrically in terms of 
the multivariate stochastic order. Such comparisons, however, are very high 
dimensional and not easy to perform. 

In this article we circumvent this challenge by considering the notion of 
the linear stochastic order between two random vectors. The linear stochas- 
tic order is a "weak" generalization of the univariate stochastic order. The 
linear stochastic order is simple to interpret and has an intuitive appeal. 
Using this notion of ordering, we developed nonparametric directional infer- 
ence procedures. Intuitively, the proposed methodology seeks to determine 
the direction that best separates two multivariate populations. Asymptotic 
properties of the estimated direction are derived. Our test based on the best 
separating direction may be viewed as a generalization of Roy's classical 
largest root test for comparing several MVN populations. To the best of our 
knowledge this is the first general test for multivariate ordered distributions. 
Since in practice sample sizes are small, we use the bootstrap methodology 
for drawing inferences. 

We illustrated the proposed methodology using a data obtained from a 
recent toxicity /carcinogenicity study conducted by the US National Toxi- 
cology Program (NTP) on the chemical Citral. A re-analysis of their 90-day 
data using our proposed methodology revealed a linear stochastic increase 
in platelets, urea nitrogen, alkaline phosphatase and bile acids in the high- 
dose group relative to the control group, which was not seen in the original 
univariate analysis conducted by the NTP. These findings suggest that the 
proposed methodology may have greater sensitivity than the commonly used 
univariate statistical procedures. Our methodology is sufficiently general 
since it is nonparametric and can be applied to discrete and/or continuous 
outcome variables. Furthermore, our methodology exploits the underlying 
dependence structure in the data, rather than analyzing one variable at a 
time. 

We note that our example and some of our results pertain to continuous 
RVs. However, the methodology may be used, with appropriate modification 
(e.g., methods for dealing with ties) with discrete (or mixed) data with 
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no problem. Although the focus of this paper has been the comparison of 
two multivariate vectors, in many applications, especially in dose response 
studies, researchers may be interested in determining trends (order) among 
several groups. Similarly to classical parametric order restricted inference 
literature, one could generalize the methodology developed in this paper 
to test for order restrictions among multiple populations. For example, one 
could extend the results to K > 2 RVs ordered by the simple ordering, that 
is, Xi -<i- 8 t X2 -<i-gt • • • -^z-st Xjf or to RVs ordered by the tree ordering, 
that is, Xi -<i-st Xj where j = 2, . . . , K . As pointed out by a referee the 
hypotheses Hq : X ^ st Y versus H\ : X 7^ st Y can also be formulated and 
tested using the approach described. First note that the null hypothesis 
implies ^(s) > 1/2 for all s £ <S2~ . On the other hand under the alternative 
there is an s 6 S? - for which ^(s) < 1/2. Thus a test may be based on the 
statistic 

iV 1 / 2 (* njm (s min )-l/2), 

where s m i n is the value which minimizes ^f nm (s). It is also clear that the 
least favorable configuration occurs when ^(s) = 1/2 for all s £ S^T 1 which 
is equivalent to X = st Y. 

We believe that the result obtained here may be useful beyond order 
restricted inference. Our simulation study suggests that our estimator of the 
best separating direction, that is, (3.5) may be useful even in the context of 
classical multivariate analysis where it may be viewed as a robust alternative 
to Roy's classical estimate. Finally we note that the linear stochastic order 
may be useful in a variety of other statistical problems. For example, we 
believe that it provides a useful framework for linearly combining the results 
of several diagnostic markers. This is a well-known problem in the context 
of ROC curve analysis in diagnostic medicine. 

APPENDIX: PROOFS 

Proof of Theorem 2.1. (i) Let g:W — >R n be an affine increasing 
function. Clearly g(x) = v + Mx for some n vector v and n x p matrix M 
with nonnegative elements. Thus for any u S we have s = M T u S 
Hence 

u T 9 (X) = u T (v+MX) = u T v+s r X ^ st u T v+s T Y=u r (v+MY)=u T 5 (Y) 

as required where the inequality holds because X ^/- s t Y. (ii) Fix J € {1, ... , 
p}. Let X = (X/,Xj), Y = (Y/,Yj) where I is the complement of I in 
{1, . . . ,p}. Further define s T = (sj,sj) where s e R^_, and set sj = 0. It 
follows that for all s 7 G R dim ( / ) we have 

sJX 7 = s T X ^ st s T Y = s^Y T 
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as required, (iii) Let <j) : R — > R be any increasing function. Note that 

E((/>(s T X)) = E(E(c/>(s T X)|Z)) < E(E(0(s T Y)|Z)) = E(0{s T Y)). 

The inequality is a consequence of X|Z = z ^/- s t Y|Z = z. Since ^> is arbi- 
trary it follows that X ^_ st Y as required, (iv) Let X = (Xi, . . . , X ra ), and 
define Y similarly. Let s 6 R+ where p = p\ H h p n . Now 

s T X = sf X x + • • • + s^X n and s r Y = sf Yi + • • • + s^Y n , 

by assumption s^Xj ^ st s?Yj for i = 1, . . . ,n. In addition s^Xj and s JXj 
are independent for i ^ j. It follows from Theorem 1.A.3 in Shaked and 

Shanthikumar (2007) that sf Xi H h s£X„ ^ st s{ Yi H h s^Y n , that 

is, X ;^_ st Y as required, (v) By assumption X n =>• X and Y n =>• Y where the 
symbol =^ denotes convergence in distribution. By the continuous mapping 
theorem s T X n => s T X and s T Y n =>• s T Y. It follows that 

(A.l) P(s T X n > i) -> P(s T X > t) and P(s T Y n > t) -»• P(s r Y > t). 

Moreover since X n zf^- s t Y n we have 

(A.2) F(s T X n >t)< P(s T Y n > i) for all n £ N. 

Combining (A.l) and (A.2) we have P(s T X > t) < P(s T Y > t), that is, 
X^, _st Y as required. □ 

Before proving Theorem 2.2, we provide a definition and a preliminary 
lemma. 

Definition A.l. We say that the RV X has an elliptical distribution 
with parameters /x and 5] and generator </>(•), denoted X ~ E p (fj,, S, (f>), if 
its characteristic function is given by exp (it T /x)0(t T St). 

For this and other facts about elliptical distributions which we use in the 
proofs below, see Fang, Kots and Ng (1989). 

Lemma A.l. Let X ~ E±(p,a, (f>) and Y ~ Ei(p' , a' , cj)) be univariate 
elliptical RVs supported on R. Then X ^ st Y if and only if fi< jj! and a = a' . 

Proof. Since X and Y have the same generator they have the stochastic 
representation: 

(A.3) X= st p + aRU and Y = st p! + a'RU, 

where R is a nonnegative RV, independent of the RV U, satisfying W(U = 
±1) = 1/2; cf. Fang, Kots and Ng (1989). It follows that RU is a symmetric 
RV supported on R with a strictly increasing DF which we denoted by Fq. 
Let Fx and Fy denote the DFs of X and Y, respectively. Note that X ^ st Y 
if and only if Fx(t) > Fy(t) for all t G R, or equivalently by (A.3), if and 
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only if 



(A.4) 



F 



t — fi 



a 



) 



^0 




) 



for all tel. It is obvious that (A.4) holds when p, < p' and a = a' , estab- 
lishing sufficiency. Now assume that X ^ st Y. Put t = \i in (A.4), and use 
the strict monotonicity of Fo to get > (/i — //)/cr', that is, // > /i. Suppose 
now that a 1 > a. It follows from (A.4) and the the strict monotonicity of Fq 
that (t — fJ.)/(J > (t — n')/a' which is equivalent to t > (pa' — p!a)/(a' — a). 
The latter, however, contradicts the fact that (A.4) holds for all tel. A 
similar argument shows that a' < a cannot hold; hence we must have a = a' 
as required. □ 

Remark A.l. Note that Lemma A.l may not hold for distributions 
with a finite support. For example, if R ~ U (0, 1), then by (A. 3) X ~ U(fx — 
a,fi + a) and Y ~ U(fi' — a' ',// + <j'). It is easily verified that in this case 
X ^ st Y if and only if A = // — [i > and — A < a' — a < A; that is, it is not 
required that a = a' . Hence the assumption that X and Y are supported on 
]R is necessary. 

We continue with the proof of Theorem 2.2. 

Proof of Theorem 2.2. Let X and Y be be E p (fj,, S,0) and E p (fx', 
X', cj)) supported on MP. Suppose that X ^- st Y. Choose s = e.; where ejfe = 1 
if i = k and otherwise. It now follows from Definition 1.1 that Xj ^ s t Y{. 
Since X{ and Yi are marginally elliptically distributed RVs with the same 
generator and supported on M, then by Lemma A.l we must have 

(A.5) Vi<Vi and an = a' u . 

The latter holds, of course, for all 1 < % < p. Choosing s = + ej we have 
Xi + Xj ^ st Yi + Yj. Note that Xi + Xj and Yi + Yj are supported on 1R and 
follow a univariate elliptical distribution with the same generator [Fang, 
Kots and Ng (1989)]. Applying Lemma A.l again we find that 

(A. 6) ^ + Hj < /x ■ + fjfj and an + ajj + 2a \j = a'n + a'^ + 2a'^ . 

The latter holds, of course, for all 1 < i ^ j < p. It is easy to see that equa- 
tions (A.5) and (A. 6) imply that /x < fi' and S = 5]'. Recall [cf. Fang, Kots 
and Ng (1989)] that we may write X = st fi + RSU and Y = st // + i?SU 
where £ = 5 T S, U is a uniform RV on , and R is a nonnegative RV. 
Let S be an upper set in MP. Clearly the set [S — fj,] := {x — fx : x G S} is also 
an upper set and [S — fj] C [S — fj,'] since /x < fi' . Now, 



P(X G S) = P(X G [5 - fi]) < P(X G [5 - fif]) = P(Y G S), 

where Xo = RSJJ, hence X ^ st Y. This proves the "if" part. The "only if 
part follows immediately. □ 
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Proof of Theorem 2.3. Let X p = {x: (x±, . . . ,x p ) G {0, 1} P } denote 
the support of a p-dimensional multivariate binary (MVB) RV. By definition 
the relationship X rfj- s t Y implies that for all (t, s) G M+ x R?_, 

(A.7) P(s T X>t) <P(s r Y>t). 

Now note that 

P(s T X > t) = £ /(x)I (s T x>t) and P(s T Y > t) = £ <?(x)I (s T x>t) , 

(A.8) 

where / and 5 are the probability mass functions of X and Y, respectively. 
Let U be an upper set on X p . It is well known [cf. Davey and Priestley 
(2002)] that U can be written as 

(A.9) U=\JU(x j ), 

where Xj are the distinct minimal elements of U, and U(x.j) = {x:x > x.,} 
are themselves upper sets [in fact U(xj) is an upper orthant]. The set {x^ : 
j G J} is often referred to as an anti-chain. Now observe that for any s G 
the set {x:s T x > t] is an upper set. Hence it must be of the form of (A.9) 
for some anti-chain {xj : j G J}. Suppose now, that for some U G X v there 
is a vector Su G such that U = {x : s^x > i] for some fixed t > 0. Then 
using (A.7) and (A.8) we have 

P(XGC/)= /(x)=P(s^X>t)<P(s^Y>t) 

xG{x : sjj-x.>i] 

= J2 9(x)=P(YGt/). 

xG{x : Sj^x>t} 

We will complete the proof by showing that for each upper set U G X p , we can 
find a vector sjj for which s^x > t for xGfJ and s^x < t for x G U c = X \ U 
if and only if p < 3. To do so we will first solve the system of equations 
s T Xj = t for j G J. This system can also be written as Xs = t where 



X: 



xi 



.xj. 



is a J x p matrix whose rows are the member of the anti-chain defining 
U, and t = (t, . . . ,t) has dimension J. Clearly the elements of X are ones 
and zeros. If J <p, the matrix X is of full rank since its rows are linearly 
independent by the fact that they are an anti-chain. Hence a solution for 
s exists. With a bit of algebra, we can further show that a solution s > 
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exists. This, of course, is trivially verified when p < 3. Now set sjj = s + e 
for some e > 0. It is clear that we can choose e small enough to guarantee 
that s^x > t if and only if x 6 U. Hence if J < p, upper set (A. 9) can be 
mapped to a vector sy. However, the inequality J < p for all upper sets 
U C X p holds if and only if p < 3. This can be easily shown by enumerating 
all 18 upper sets belonging to X% [cf. Davidov and Peddada (2011)] and 
noting that they have at most three minimal elements. Hence if p < 3, then 
X ;<z- s t Y <^=^ X ^ st Y as required. 

Now let p = 4, and consider the upper set U generated by the anti-chain 
Xj, j = 1, . . . , J where x,- are all the distinct permutations of the vector 
(1, 1, 0, 0). Clearly J = 6. Note that although J > p, the system of equations 
Xs = t is uniquely solved by s£ = (t/2, i/2, t/2, i/2). However, this solution 
coincides with the solution of the system X's = t where X' is any matrix 
obtained from X by deleting any two (or just one) of its rows. Note that 
the rows of X' correspond to an upper set U' C U. This, in turn, implies 
that for any such U' one cannot find a vector sj// satisfying s^,x > t if and 
only if x E V because the inequality will hold for all xG(7. Thus U' does 
not define an upper half plane. This shows that the linear stochastic order 
and the multivariate stochastic order do not coincide when p = 4. A similar 
argument may be used for any p > 5. This completes the proof. □ 

We first define the term copula. 

Definition A. 2. Let F be the DF of a p-dimensional RV with marginal 
DFs Fi, . . . , F p . The copula C associated with F is a DF such that 

F(x) = C(x) = C(Fx( Xl ), F p (x p )). 

It follows that the tail-copula C(-) is nothing but the tail of the DF C(-). 

Proof of Theorem 2.4. Suppose that X and Y have the same copula. 
Let X ;^- s t Y. Choosing s = where = 1 if i = k and otherwise, we 
find using the definition that Xi ^ s t Y%- The latter holds, of course, for all 
1 <i <p. Applying Theorem 6.B.14 in Shaked and Shanthikumar (2007), 
we find that X ^ st Y. The reverse direction is immediate. □ 

Proof of Theorem 2.5. Note that for any x E W we have 
F(x) = C x (Fi(xi), . . .,F p (x p )) > Cx(Gi(xi), . . . , G p (x p )) 
>C Y (G 1 (x l ),...,G p (x p )) = G(x). 



This means that X Y. The other part of the theorem is proved similarly. 
□ 
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Proof of Proposition 3.1. Let X and Y be independent MVNs with 
means fi < v and common variance matrix S. Clearly 

where <3? is the DF of a standard normal RV. It follows that P(s T X < s T Y) 
is maximized when the ratio s T (v — fJ.)/V s T Ss is maximized. From the 
Cauchy— Schwarz inequality we have 

(A.10) < J [U - H)TV-\ V - y) 

for all s. It is now easily verified that s = — /x) maximizes the left- 

hand side of (A. 10). □ 

Proof of Proposition 3.2. Let Q q , q=l,...,4, be the four quad- 
rants. It is clear that maximizing (3.1) is equivalent to maximizing 

(A.ll) <, ro (s)= Vz 

ZijGQ2 Zij6Q4 

It is also clear that for any s the indicators I( s tz > ) are independent of 
the length of Zy which we therefore take to have length unity. Observe that 
the value of (A.ll) is constant in the intervals (6[i],9[i+i]) where 0m are 
defined in Algorithm 3.1. At each point 9\q, i = 0, . . . ,M + 1, the value of 

(A.ll) may increase or decrease. It follows that for all s G fy' nm (s) £ 
{^ m (s[o])) • • • , *n,m( s [M+i])} wnere are defined in Algorithm 3.1. There- 
fore the maximum value of (3.1) is an element of the above list. Now sup- 
pose that sm is a global maximizer of (A.ll). Clearly either m (srji) = 

*nm( S [i-l]) OT ^n,m( S [i\) = ^n,m( S [i+l]) must hold > in whicn case an y value 

in , or , is a global maximizer. This concludes the proof. □ 

Proof of Theorem 3.1. Using Hajek's projection and for any s, we 
may write 

n m 

(A.12) * n , m (a) = *(s) + n- 1 Vi(Xi, s) + m" 1 M^j , s) + #n, m ( s )> 

i=i j=i 

where 

^i(X i ,s) = G(s T X i )-^(s), 

^ 2 (Y,,s) = F(s T Y J )-M/(s) 

and Rn^ m (s) is a remainder term. Here G?(s T x) = P(s r Y > s T x), F(s r y) = 
P(s T X < s T y) and *(s) = E(G(s T X i )) = E(F(s T Yj)). Clearly E[V>i(X;,s)] = 
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E[t/>2(Yj, s)] = for all i and j, so by the strong law of large numbers 
n_1 E"=iV'i(Xi,! 
ability one. Now, 



n 1 Yli=i > s ) ana ^ m 1 S jLi ^2 ( Yj , s) both converge to zero with prob- 



SUp |*n,m(s) - *(s)| < SUp 



+ sup 



+ sup \Rn jm (s)\. 

The set 5? _1 is compact, and the function ^i(x,s) is continuous in s £ 
for all values of x and bounded [in fact |^i(x, s)| < 2]. Thus the condi- 
tions in Theorem 3.1 in DasGupta (2008) are satisfied, and it follows that 
sup s&s? >-i |n _1 Ya=i ^i(-^-i) s )l ~^ as n — > oo. Similarly sup s( _ 5 p-i jm" 1 x 

^(Yj, s)| ^4 as m — > oo. Since x I' n>rn (s) is bounded all its moments 
exist; therefore from Theorem 5.3.3 in Serfling (1980) we have that with 
probability one i? n/m (s) = o(l/N). Moreover it is clear that the latter holds 
uniformly for all s. Thus, 

sup \^ n ,m{^) — VP(s)| ^4 as n,m->oo. 

By assumption ^(s max ) > ^(s) for all s £ \ s max so we can apply The- 
orem 2.12 in Kosorok (2008) to conclude that 

a.s. 

Smax r S max , 

that is, s max is strongly consistent. This completes the first part of the proof. 

Since the densities of X and Y are differentiable, it follows that ^(s) is 
continuous and twice differentiable. In particular at s max £ , the matrix 
-V 2 ^( 

s max) exists and is positive definite. A Taylor expansion implies that 
sup tf(s) -^(s max ) < -C5 2 . 

||s— s max ||<<5 
It is also obvious that 

n 

7i- 1 ^Vi(X i ,s max ) = O p (l/ViV) 

i=l 

and 

m 

m- 1 J^MYj, s max ) = O p (l/VN). 

3=1 

Finally as noted above \R n ,m( s )\ — 0(1/N) for all s as n, m — > oo. There- 
fore by Theorem 1 in Sherman (1993) we have that 

(A.13) s max = s max + O p (iV- 1 / 2 ); 
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that is, s max converges to s max at a iV 1 / 2 rate. This completes the second 
part of the proof. 

The functions ^/(s),ipi(X.i,s) and ^ ( Yj , s) on the right-hand side of 
(A. 12) all admit a quadratic expansion. A bit of algebra shows that for 
s in an O p {N~ 1 / 2 ) neighborhood of s max , we have 

*n,m(s) = ^(Smax) + (s - S max ) T ^"^ 

(A.14) i 

+ -(s - s max ) T V(s - s max ) + o p (l/N), 

where 

M i Er=iWi(x„s max ) | i E^iV^CYj.s^) 

y/K^m n 1 / 2 ^/l — \ n ^ m TO 1 / 2 

A n ,m = n/N, for j = 1,2 the function Vipj(-, s max ) T is the gradient of ipj(-,s) 
evaluated at s max , and the matrix V is given by 

V = E(V 2 Vi (X, s max )) + E(V 2 V 2 (Y, s max )). 

Note that the o p (l/N) term in (A.14) absorbs R 

n,m( s ) i n (A. 12) as well as 
the higher-order terms in the quadratic expansions of ^(s), n _1 Y17=i V'i(Xi, s) 
and m" 1 Y^=i ^(Yj, s). Now by the CLT and Slutzky's theorem, we have 
that 



M„ ]m =>■ N(0, A) 



where 



A = ^E(V Vi (X, s max ) Wi (X, s max ) T ) 

+ T -^-E(VV 2 (Y,s max )VV 2 (Y, 

Finally it follows by Theorem 2 in Sherman (1993) that 

iV 1/2 (s max -s max )^iV(0,S), 
where S = V _1 Ay _1 , completing the proof. □ 

Proof of Theorem 3.2. Suppose that X and Y are discrete RVs 
with finite support. Let p a = P(X = x a ) > and % = P(Y = y&) > where 
a = 1, . . . , A and b = 1, . . . ,B; A and B are finite. Define the set S a b = {s € 
S+~ :s T x a < s^y;,}. A simple argument shows that 

A B K 
*(s) =P(s T X < s T Y) = ^^I (seSa6 )Pa% = J2 a k\seS k ), 

a=l b=l k=l 
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where K < 2 AB is finite, the sets are distinct, (Jfc=i Sfc = and 
a k = H{a,b)£j k Paqb with J k = {(a, b) : S fe n S o6 / 0}. Thus *(s) is a sim- 
ple function on and S max is the set associated with the largest a^. We 
will assume, without any loss of generality, that a\ > otk for all k>2. Now 
note that 

- n m A B 

* n ' m(s) = ^ EEVxi^Yj) = EE ~^r l te s <*h 

i=l j=l o=l 6=1 

where n a = XT=i 1 (x i =x a ),'7i6 = ^(Y^y,,) where a = l,...,A and 6 = 

1, . . . , B. Clearly ^ n]m (s) is also a simple function. Moreover for large enough 
n and m we will have n a > and toj > for all a = 1, . . . , A and & = 1, . . . , B, 
and consequently ^ / njm (s) is defined over the same sets as ^(s), that is, 

K 

*n,m(s) = ^d fc I (sgSfc) , 
fe=l 

where dfc = Yl( a b)eJ k PaQb with p a = n a /n and = m\,/m. Furthermore the 
maximizer of * n ,m(s) is any s € provided that is associated with the 
largest dfc. Hence, 



F(s r 

(A.15) 



£ Smax) = p(axg max a fc / l) = P [ |J {di < a k } J 

\fc=2 / 

max P(di < d^) 

fc=2 2<k<K 



A bit of rearranging shows that 

(a,6)GJi {a,b)£j k 

(n m \ 
--^EE^ . 
i=i i=i / 

where 

Z ij ] = E I (Xi=x») I (Yi=y 6 ) - E ^X^x^Y^)- 

(a,fe)eJi\J fe (o,6)eJ fc \Ji 

Note that ; may be viewed as a kernel of a two sample [/-statistic. More- 
over 

-\Jk\Jl\<Z^ ] <\J k \J!\ 



is bounded (here | • | denotes set cardinality) and E(zjj^) = /Xfe = E(di 



i.i 



dfc) > by assumption. Applying Theorem 2 and the derivations in Sec- 
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tion 5b in Hoeffding (1963) we have that 

(A.16) F^n m 2,2^ <0J<exp^ 2( | Jfe Ul | + | ^ \ Jfe | )2 

where \ n) m = n/N — > A G (0, 1) as n, m — >• oo. Finally from (A. 15) and (A.16) 
we have that 

11 V°max 

where Ci = K — 1 and C 2 = min(A, 1 - A) min 2/x|(| J k \J\\ + \J\ \ Jk\)~ 2 
completing the proof. □ 

Proof of Theorem 3.3. Choose e > 0. We have already seen that 
under the stated conditions, Vl/(s) is continuous, and therefore for each s the 
set B S)£ = {s' : |*(s') - ^(s)| < e} is open. The collection {B s , e : s G is 
an open cover for <S? _1 . Since is compact there exists a finite subcover 

-B sii£ , . . . , -Bs K ,e for iS^ -1 where K < oo. Hence each s belongs to some B s >e 
and therefore 

sup |*iv(s) - \&(s)| 

< max sup |\Pjv(s) — \Pjv(sj)| 
i<i<K seBs , E 

+ max |*jv(sj) - *(si)| + max sup |*(sj) - \P(s)|. 

l<i<K l<i<K sGBsiE 

By construction sup sgBs e (^(sj) — ^(s)| < e for all i. By the law of large 
numbers ^/^(sj) ^4' \£(sj) as N — > oo for each i = 1, . . . ,K. Since K is finite, 

K 

max |tf jv(si) - *(si)| < V|*jv(si) - #(s;)| ^4' 0. 

Ki<A ' — ' 
i=l 

Now for any s' G £ s . )£ , *jv(s') ^ *(s'), ^Tv(sj) ^ and |*(s')-*(Si)| < 
e. This implies that we can choose N large enough so ^^(s') — ( s i) I < 2e. 
Moreover this bound holds for all s' G B s , )£ and i so 

limsup|^Ar(s) - \t(s)| < 3e 
on S+~ ■ Since e is arbitrary we conclude that sup gg5 p-i |^jv( s ) ~~ ^( s )| ~^ 

as N — > oo. By assumption Vf^Smax) > ^(s) for all s G \ s max , so we can 
apply Theorem 2.12 in Kosorok (2008) to conclude that 

a.s. 

^max r S max , 

that is, s max is strongly consistent. This completes the first part of the proof. 
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We have already seen that 

sup *(s) - *(s max ) < -C5 2 

||s S max || 

holds. We now need to bound E* su P || s _ Smax || <(5 N 1 / 2 ] (P 7V (^( S ) - ^(s max ))|, 
where E* denotes the outer expectation and \I/(s) = I( s t z >o) — ^( s )- We first 
note that the bracketing entropy of the upper half-planes is of the order 5/e 2 . 
The envelope function of the class I( s t z>0 ) — I( s t z>0 ) where ||s — s max || < 5 
is bounded by I (s T z > 0>s T axZ) + I (s T axZ > 0>s T z) whose squared L 2 norm is 

(A.17) P(s T Z > > s max Z) +P(sL x Z > > s T Z). 

Note that we may replace the RV Z in (A.17) with the RV Z' = Z/\\Z\\ whose 
mass is concentrated on the unit sphere. The condition that ||s — s max || < 5 
implies that the angle between s and s max is of the order 0(<5), and therefore 
P(s r Z' > > s max Z') is computed as surface integral on a spherical wedge 
with maximum width 5. It follows that (A.17) is bounded by 2y4 p _i<5||/i / || 00 
where A p _i is the area of , and ||/i'||oo is the supremum of the density 
of Z'. Clearly ||/i'||oo < oo since the density of Z is bounded by assumption. 
Thus by Corollary 19.35 in van der Vaart (2000) we have 

E* sup AT 1 /2|P JV (§( S ) _ #( Smax ))| < C5 1 ' 2 . 

||s— s max ||<<5 

It now follows that 

(A.18) ^iv(s max )> sup ^7v(s)-o p (A^- 2 / 3 ), 

which implies by Theorem 5.52 in van der Vaart (2000) and Theorem 14.4 
of Kosorok (2008) that 

^max — S max + Op(N I ), 

that is, s max converges to s max at a cube root rate. This completes the second 
part of the proof. 

The limit distribution is derived by verifying the conditions in Theo- 
rem 1.1 of Kim and Pollard (1990), denoted henceforth by KP. First note 
that (A.18) is condition (i) in KP. S ince s max is consistent, condition (ii) 
also holds, and condition (iii) holds by assumption. The differentiability 
of the density of Z implies that ^(s) is twice differentiable. The unique- 
ness of the maximizer implies that — V 2x I'(s max ) is positive definite, and 
hence condition (iv) holds; see also Example 6.4 in KP for related cal- 
culations. Condition (v) in KP is equivalent to the existence of the limit 
H(u, v) = lim^—^oo aE(f(s max + u/a)^(s max + v/a)) which can be rewrit- 
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ten as 

lim -jr[P((s max + /3u) T Z > 0, (s max + /3v) T Z > 0) 
p->o p 

- P((Smax + /?U) T Z > 0)P((s max + /3v) T Z > 0)]. 

With some algebra we find that this limit exists and equals 

£( s max z )( zT (u + V ))M Z ) d z 



zeS_ 



p-i 



~ /( s max z )( zTu )M z )d z / ,*( s L z )( zTv )M z )^. 

where <5(s max z) is the usual Dirac function; hence integration is with respect 
to the surface measure on {s max z = 0}. It follows that condition also (v) 
holds. Conditions (vi) and (vii) were verified in the second part of the proof. 
Thus we may apply Theorem 1.1 in KP to get 

iV 1/3 (s m ax - s max ) argmax{-Q(s) + W(s) :s £ S^ 1 }, 

where by KP Q(s) = s T V 2 ^(s max )s and W(s) is a zero mean Gaussian pro- 
cess with covariance function H(u,v). This completes the proof. □ 

Proof of Proposition 3.3. Note that 

*(b) = F(s^X < s^Y) = F(s T Z > 0) =P ( sr 2-f* > 

V vs J 2JS vs J 2JS 

s r <5 



Now, by assumption the DF F is independent of s. Therefore ^(s) is uniquely 
maximized on S^T if and only if the function 

s T S 



X s 



is uniquely maximized on 5? — 1 . If S = /, then x(s) = s T 5, and we wish to 

maximize a linear function on It is easily verified (by using ideas from 

linear programming) that the maximizer is unique if S > which is true by 
assumption. Incidentally, it is easy to show directly that x(s) is maximized 
at s*/||s*|| where 

s* = (5il( 5l > ),...,5pl( 5p > )). 

Now let S/I and assume that a unique maximizer does not exist; that 
is, suppose that x(s) is maximized by both si and S2- It is clear that 
x(AiSi) = x(A2S2) for all Ai,A2 > 0; that is, the value of *c(-) is constant 
along rays through the origin. The rays passing through Si and S2, respec- 



38 



O. DAVIDOV AND S. PEDDADA 



tively, intersect the ellipsoid s T Ss = 1 at the points pi and P2. It follows 
that x(pi) = x(p2), moreover pi and p2 maximize x(-) on the ellipsoid. 
Now since p^5]pi = 1 = P2"5]p2 we must have pj8 = pj^- R- eca h that a 
linear function on ellipsoid is uniquely maximized (just like on a sphere; see 
the comment above) . Therefore we must have pi = P2 which implies that 
si = S2 as required. □ 

Proof of Theorem 3.4. If X = st Y, then for all s we have s T X = st 
s T Y. By assumption both s T X and s T Y are continuous RVs, so P(s T X < 
s T Y) = 1/2. Suppose now that both X ^ st Y and P(s T X < s T Y) > 1/2 for 
some s E 5+ , hold. Then we must have X -<i- B t Y. Since X ^ st Y we have 
Xj r^st Yj for l<j<p. One of these inequalities must be strict; otherwise 
X = st Y contradicts the fact that X -</- s t Y. Now use Theorem 1 in Davidov 
and Peddada (2011) to complete the proof. □ 

Proof of Theorem 3.5. The functions tp\ and V>2 defined in the proof 
of Theorem 3.1 are Donsker; cf. Example 19.7 in van der Vaart (2000). Hence 
by the theory of empirical processes applied to (A. 12), we find that 

(A.19) Ar 1 / 2 (M/ n ,, m (s)-M/(s))^G(s), 

where G(s) is a zero mean Gaussian process, and convergence holds for all 
s E S^T . We also note that (A.19) is a two-sample [/-processes. A central 
limit theorem for such processes is described by Neumeyer (2004). Hence by 
the continuos mapping theorem, and under Hq 7 we have -^^^(^^^(snxax) — 
1/2)) =4> sup sGi5 p-i G(s) where the covariance function of G(s), denoted by 

C (u, v) , is given by 

yP(u r Xi < u T X 2 , v T Xi < v T X 3 ) 

(A.20) 

+ P(u r Xi < u T X 2 , v T X 3 < v T X 2 ) ; -, 

1-A V ' ; 4A(1-A)' 

where Xi,X2,X3 are i.i.d. from the common DF. □ 

Proof of Theorem 3.6. Suppose that X -<i- st Y. Then for some s* E 
S^' 1 we have sjx -< st s^Y which implies that P(s^X < sjY) > 1/2. By 
definition P(s^ ax X < s^Y) > P(s^X < s^Y) so ^(s max ) > 1/2. It follows 
from the proof of Theorem 3.1 that ^n,m(s m ax) — > ^(smax) with probability 
one. Thus, 

5 , n,m = A rl/2 (^n,m(Smax) ~ 1/2) OO as W,m->00. 

Therefore by Slutzky's theorem, 

P(5 , n , m > q n , m> i- a ; Hi) ->• 1 as n,m oo, 
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where q n ,m,i-a is the critical value for an a level test based on samples of size 
n and m and <?n,m,i— a — ^ Qi— a.- 

Hence the test based on S nm is consistent. 
Consistency for I n ,m and Inm is established in a similar manner. 

Now assume that X z^- st Y z^- s t Z so that s T Y ^ st s T Z for all s £ 
Fix x.j, i = 1, ...,n, and choose s G Without any loss of generality 

assume that s T xi < s T X2 < ■•■ < s T x n . Define Uj = EiLi ^(s T x,<s T Y j ) and 
V^- = E?=i ^(s T x i <s T z j ) • Clearly C/j and Vj take values in J = {0, . . . , n}. Now, 
for k £ J we have 

F(C/j > fc) = P(s T Y i > s T x fc ) < P(s T Z j > s T x fc ) = F(Vj > k), 

where we use the fact that s T Y ^ st s T Z. It follows that Uj ^ s t Vj for 
j = 1, . . . ,m. Moreover {Uj} and {Vj} are all independent and it follows 
from Theorem 1.A.3 in Shaked and Shanthikumar (2007) that EfLi^j — st 

EjLi ^ - Thus E"=i ££i Vx^y,-) ^st E?=i ESliV*^^)- The lat - 

ter holds for every value of xi , . . . , x n , and therefore it holds unconditionally 
as well, that is, 

n m n m 

Yl Yl V'X^Yi) =^t X] I (s T X 1 <s^Z J )' 

It follows that ^^^(s) ^ st ^^m(s) for all s G where ^^^(s) and 

*n,m(s) are denned in (3.1) and the superscripts emphasize the different 
arguments used to evaluate them. Thus 

^XYz-XYn > ^X,Z/~X,Y\ , ^X.Z/^Zn 
^n,m \°max / — st ^n,m\ B max7 — st *n,ml B maxi 

and as a consequence ¥(Sn,m > q n ,m,i-a) < ^(S n ,m > q n ,m,i-a) as required. 
The monotonicity of the power function of I n ^ m and I^m follows immediately 

from the fact that *f> n ,m (s) ^ s t *n,m (s) for all se«S£ . □ 

Acknowledgments. We thank Grace Kissling (NIEHS), Alexander Gold- 
enshluger, Yair Goldberg and Danny Segev (University of Haifa), for their 
useful comments and suggestions. We also thank the Editor, Associate Edi- 
tor and two referees for their input which improved the paper. 

REFERENCES 

Abrevaya, J. and Huang, J. (2005). On the bootstrap of the maximum score estimator. 

Econometmca 73 1175-1204. MR2149245 
Arcones, M. A., Kvam, P. H. and Samaniego, F. J. (2002). Nonparametric estimation 

of a distribution subject to a stochastic precedence constraint. J. Amer. Statist. Assoc. 

97 170-182. MR1947278 
Audet, C, Bechard, V. and Le Digabel, S. (2008). Nonsmooth optimization through 

mesh adaptive direct search and variable neighborhood search. J. Global Optim. 41 

299-318. MR2398937 



40 



O. DAVIDOV AND S. PEDDADA 



Bekele, B. N. and Thall, P. F. (2004). Dose-finding based on multiple toxicities in a 
soft tissue sarcoma trial. J. Amer. Statist. Assoc. 99 26-35. MR2061885 

Bickel, P. J. and Sakov, A. (2008). On the choice of m in the m out of n bootstrap 
and confidence bounds for extrema. Statist. Sinica 18 967-985. MR2440400 

DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability. Springer, New 
York. MR2664452 

Davey, B. A. and Priestley, H. A. (2002). Introduction to Lattices and Order, 2nd ed. 
Cambridge Univ. Press, New York. MR1902334 

Davidov, O. (2012). Ordered inference, rank statistics and combining p- values: A new 
perspective. Stat. Methodol. 9 456-465. MR2871445 

Davidov, O. and Herman, A. (2011). Multivariate stochastic orders induced by case- 
control sampling. Methodol. Comput. Appl. Probab. 13 139-154. MR2755136 

Davidov, O. and Herman, A. (2012). Ordinal dominance curve based inference for 
stochastically ordered distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 825- 
847. 

Davidov, O. and Peddada, S. (2011). Order-restricted inference for multivariate binary 

data with application to toxicology. J. Amer. Statist. Assoc. 106 1394-1404. MR2896844 
Delagdo, M., Rodriguez-Poo, J. M. and Wolf, M. (2001). Subsampling inference 

in cube root asymptotics with an application to Manski's maximum score estimator. 

Econom. Lett. 73 241-250. 
Ding, Y. and Zhang, X. (2004). Some stochastic orders of Kotz-type distributions. 

Statist. Probab. Lett. 69 389-396. MR2091758 
Fang, K. T., Kots, S. and Ng, K. W. (1989). Symmetric Multivariate and Related 

Distributions. Chapman & Hall, London. 
Fisher, N. I. and Hall, P. (1989). Bootstrap confidence regions for directional data. 

J. Amer. Statist. Assoc. 84 996-1002. MR1 134489 
Hajek, J., Sidak, Z. and Sen, P. K. (1999). Theory of Rank Tests, 2nd ed. Academic 

Press, San Diego, CA. MR1680991 
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. 

J. Amer. Statist. Assoc. 58 13-30. MR0144363 
Hu, J., Homem-de-Mello, T. and Mehrotra, S. (2011). Concepts and applications 

of stochastically weighted stochastic dominance. Unpublished manuscript. Available at 

http : //www. optimization-online . org/DB_FILE/2011/04/2981 .pdf . 
Ivanova, A. and Murphy, M. (2009). An adaptive first in man dose-escalation study of 

NGX267: Statistical, clinical, and operational considerations. J. Biopharm. Statist. 19 

247-255. MR2518371 

Joe, H. (1997). Multivariate Models and Dependence Concepts. Chapman & Hall, London. 
MR1462613 

Johnson, R. and Wichern, D. (1998). Applied Multivariate Statistical Analysis. Prentice 
Hall, New York. 

Kim, J. and Pollard, D. (1990). Cube root asymptotics. Ann. Statist. 18 191-219. 
MR1041391 

KOSOROK, M. R. (2008). Introduction to Empirical Processes and Semiparametric Infer- 
ence. Springer, New York. MR2724368 

Lee, S. M. S. (1999). On a class of m out of n bootstrap confidence intervals. J. R. Stat. 
Soc. Ser. B Stat. Methodol. 61 901-911. MR1722246 

Lucas, L. A. and Wright, F. T. (1991). Testing for and against a stochastic order- 
ing between multivariate multinomial populations. J. Multivariate Anal. 38 167-186. 
MR1131713 



INFERENCE FOR THE LINEAR STOCHASTIC ORDER 



41 



Moser, V. (2000). Observational batteries in neurotoxicity testing. International Journal 

of Toxicology 19 407-411. 
Neumeyer, N. (2004). A central limit theorem for two-sample [/-processes. Statist. 

Probab. Lett. 67 73-85. MR2039935 
NTP (2003). NTP toxicology and carcinogenesiss studies of citral (microencapsulated) 

(CAS No. 5392-40-5) in F344/N rats and B6C3F1 mice (feed studies). 
Peddada, S. D. (1985). A short note on Pitman's measure of nearness. Arner. Statist. 39 

298-299. MR08 17793 

Peddada, S. D. and Chang, T. (1996). Bootstrap confidence region estimation of the 

motion of rigid bodies. J. Amer. Statist. Assoc. 91 231-241. MR1394077 
Pitman, E. J. G. (1937). The closest estimates of statistical parameters. Proceedings of 

the Cambridge Philosophical Society 33 212-222. 
Price, C. J., Reale, M. and Robertson, B. L. (2008). A direct search method for 

smooth and non-smooth unconstrained optimization problems. ANZIAM J. 48 927- 

948. 

Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate 

analysis. Ann. Math. Statist. 24 220-238. MR0057519 
Sampson, A. R. and Whitaker, L. R. (1989). Estimation of multivariate distributions 

under stochastic ordering. J. Amer. Statist. Assoc. 84 541-548. MR1010343 
Sen, B., Banerjee, M. and Woodroofe, M. (2010). Inconsistency of bootstrap: The 

Grenander estimator. Ann. Statist. 38 1953-1977. MR2676880 
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New 

York. MR0595165 

Shared, M. and Shanthikumar, J. G. (2007). Stochastic Orders. Springer, New York. 
MR2265633 

Sherman, R. P. (1993). The limiting distribution of the maximum rank correlation esti- 
mator. Econometrica 61 123-137. MR1201705 
VAN der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge Univ. Press, Cambridge. 

Department of Statistics Biostatistics Branch 

University of Haifa National Institute 

Mount Carmel, Haifa 31905 of Environmental Health Sciences 

Israel 111 T.W. Alexander Drive 

E-mail: davidov@stat.haifa.ac.il Research Triangle Park, North Carolina 27709 

USA 

E-MAIL: peddada@niehs.nih.gov 



